74 research outputs found

    Efficient learning in Approximate Bayesian Computation

    Get PDF
    Efficient learning in Approximate Bayesian Computatio

    Operator norm convergence of spectral clustering on level sets

    Full text link
    Following Hartigan, a cluster is defined as a connected component of the t-level set of the underlying density, i.e., the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the dataset into the feature space, which establishes the strong consistency of our proposed algorithm

    The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous

    Full text link
    Let M be a bounded domain of a Euclidian space with smooth boundary. We relate the Cheeger constant of M and the conductance of a neighborhood graph defined on a random sample from M. By restricting the minimization defining the latter over a particular class of subsets, we obtain consistency (after normalization) as the sample size increases, and show that any minimizing sequence of subsets has a subsequence converging to a Cheeger set of M

    Resampling: an improvement of Importance Sampling in varying population size models

    Get PDF
    Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become inefficient when the population size varies in time, making likelihood-based inferences difficult in many demographic situations. In this work, we modify a previous sequential importance sampling algorithm to improve the efficiency of the likelihood estimation. Our procedure is still based on features of the model with constant size, but uses a resampling technique with a new resampling probability distribution depending on the pairwise composite likelihood. We tested our algorithm, called sequential importance sampling with resampling (SISR) on simulated data sets under different demographic cases. In most cases, we divided the computational cost by two for the same accuracy of inference, in some cases even by one hundred. This study provides the first assessment of the impact of such resampling techniques on parameter inference using sequential importance sampling, and extends the range of situations where likelihood inferences can be easily performed

    Bayesian functional linear regression with sparse step functions

    Full text link
    The functional linear regression model is a common tool to determine the relationship between a scalar outcome and a functional predictor seen as a function of time. This paper focuses on the Bayesian estimation of the support of the coefficient function. To this aim we propose a parsimonious and adaptive decomposition of the coefficient function as a step function, and a model including a prior distribution that we name Bayesian functional Linear regression with Sparse Step functions (Bliss). The aim of the method is to recover areas of time which influences the most the outcome. A Bayes estimator of the support is built with a specific loss function, as well as two Bayes estimators of the coefficient function, a first one which is smooth and a second one which is a step function. The performance of the proposed methodology is analysed on various synthetic datasets and is illustrated on a black P\'erigord truffle dataset to study the influence of rainfall on the production

    Approximate Bayesian Computational methods

    Full text link
    Also known as likelihood-free methods, approximate Bayesian computational (ABC) methods have appeared in the past ten years as the most satisfactory approach to untractable likelihood problems, first in genetics then in a broader spectrum of applications. However, these methods suffer to some degree from calibration difficulties that make them rather volatile in their implementation and thus render them suspicious to the users of more traditional Monte Carlo methods. In this survey, we study the various improvements and extensions made to the original ABC algorithm over the recent years.Comment: 7 figure

    Clustering by Estimation of Density Level Sets at a Fixed Probability

    No full text
    In density-based clustering methods, the clusters are defined as the connected components of the upper level sets of the underlying density ff. In this setting, the practitioner fixes a probability pp, and associates with it a threshold t(p)t^{(p)} such that the level set {f≥t(p)}\{f\geq t^{(p)}\} has a probability pp with respect to the distribution induced by ff. This paper is devoted to the estimation of the threshold t(p)t^{(p)}, of the level set {f≥t(p)}\{f\geq t^{(p)}\}, as well as of the number k(t(p))k(t^{(p)}) of connected components of this level set. Given a nonparametric density estimate f^n\hat f_n of ff based on an i.i.d. nn-sample drawn from ff, we first propose a computationally simple estimate tn(p)t_n^{(p)} of t(p)t^{(p)}, and we establish a concentration inequality for this estimate. Next, we consider the plug-in level set estimate {f^n≥tn(p)}\{\hat f_n\geq t_n^{(p)}\}, and we establish the exact convergence rate of the Lebesgue measure of the symmetric difference between {f≥t(p)}\{f \geq t^{(p)}\} and {f^n≥tn(p)}\{\hat f_n\geq t_n^{(p)}\}. Finally, we propose a computationally simple graph-based estimate of k(t(p))k(t^{(p)}), which is shown to be consistent. Thus, the methodology yields a complete procedure for analyzing the grouping structure of the data, as pp varies over (0;1)(0;1)

    Efficient learning in ABC algorithms

    Full text link
    Approximate Bayesian Computation has been successfully used in population genetics to bypass the calculation of the likelihood. These methods provide accurate estimates of the posterior distribution by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for ensuring a suitable approximation quality of the posterior distribution are still high. To alleviate the computational burden, we propose an adaptive, sequential algorithm that runs faster than other ABC algorithms but maintains accuracy of the approximation. This proposal relies on the sequential Monte Carlo sampler of Del Moral et al. (2012) but is calibrated to reduce the number of simulations from the model. The paper concludes with numerical experiments on a toy example and on a population genetic study of Apis mellifera, where our algorithm was shown to be faster than traditional ABC schemes
    • …
    corecore